Uni- and bivariate data
transformations using R
 

SIMP59: Data Selection and Visualisation
7.5 credits VT25

nils.holmberg@iko.lu.se

Overview

  • rmarkdown notebooks
  • 7.2 Import dataframes
  • 3.2 Rows
  • 3.3 Columns
  • missing values, outliers
  • filter and select data
  • 3.4 The pipe (dplyr)
  • 5.2 Tidy data
  • 3.5 Groups
  • 3.6 Aggregates
  • 5.3 Lengthening data
  • 5.4 Widening data
  • summarizing data
  • export dataframe
  • 1.4 Visualizing data

whole

A diagram displaying the data science cycle: Import -> Tidy -> Understand  (which has the phases Transform -> Visualize -> Model in a cycle) -> Communicate. Surrounding all of these is Program Import, Tidy, Transform, and Visualize is highlighted.

Figure 1: In this section of the book, you’ll learn how to import, tidy, transform, and visualize data.

Course literature

Wickham, Çetinkaya-Rundel, and Grolemund (2023)

Wilke (2019)

Watt and Naidoo (n.d.)

rmarkdown, scripts

import

20 Spreadsheets 21 Databases 22 Arrow 23 Hierarchical data

transform

12 Logical vectors 13 Numbers 14 Strings 15 Regular expressions 16 Factors 17 Dates and times 18 Missing values 19 Joins

figure, pivot

A diagram showing how `pivot_longer()` transforms a simple data set, using color to highlight how column names ("bp1" and "bp2") become the values in a new `measurement` column. They are repeated three times because there were three rows in the input.

Figure 2: The column names of pivoted columns become values in a new column. The values need to be repeated once for each row of the original dataset.

Palmer Penguins

test

Quantitative methods

    1. Experiments and
      Threats to Validity
    1. Survey Research,
      Questionnaire
    1. Quantitative
      Content Analysis

Lectures and workshops

Data collection (nov 12)

    1. Concept Explication and Measurement
    1. Reliability and Validity
    1. Effective ­Measurement
    1. Sampling
    1. Content Analysis

Exam question 1

Data analysis (nov 26)

    1. Experiments and Threats to Validity
    1. Survey Research
    1. Descriptive Statistics
    1. Inferential Statistics
    1. Multivariate Statistics

Exam question 2

9. Experiments and Threats to Validity

  • Random Assignment (p. 225)
  • Between-Subjects Design (p. 227)
  • Within-Subjects Design (p. 228)
  • Treatment Groups (p. 233)
  • Stimulus (p. 233)
  • Control Group (p. 238)

Next steps

Workshop 2, dec 2

References

Watt, H., and T. Naidoo. n.d. “Data Wrangling Recipes in r.” https://bookdown.org/hcwatt99/Data_Wrangling_Recipes_in_R/#why-data-wrangling-recipes-in-r.
Wickham, Hadley, Mine Çetinkaya-Rundel, and Garrett Grolemund. 2023. R for Data Science. " O’Reilly Media, Inc.".
Wilke, Claus O. 2019. Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures. O’Reilly Media.